30 research outputs found

    Protein surface representation and analysis by dimension reduction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein structures are better conserved than protein sequences, and consequently more functional information is available in structures than in sequences. However, proteins generally interact with other proteins and molecules via their surface regions and a backbone-only analysis of protein structures may miss many of the functional and evolutionary features. Surface information can help better elucidate proteins' functions and their interactions with other proteins. Computational analysis and comparison of protein surfaces is an important challenge to overcome to enable efficient and accurate functional characterization of proteins.</p> <p>Methods</p> <p>In this study we present a new method for representation and comparison of protein surface features. Our method is based on mapping the 3-D protein surfaces onto 2-D maps using various dimension reduction methods. We have proposed area and neighbor based metrics in order to evaluate the accuracy of this surface representation. In order to capture functionally relevant information, we encode geometric and biochemical features of the protein, such as hydrophobicity, electrostatic potential, and curvature, into separate color channels in the 2-D map. The resulting images can then be compared using efficient 2-D image registration methods to identify surface regions and features shared by proteins.</p> <p>Results</p> <p>We demonstrate the utility of our method and characterize its performance using both synthetic and real data. Among the dimension reduction methods investigated, SNE, LandmarkIsomap, Isomap, and Sammon's mapping provide the best performance in preserving the area and neighborhood properties of the original 3-D surface. The enriched 2-D representation is shown to be useful in characterizing the functional site of chymotrypsin and able to detect structural similarities in heat shock proteins. A texture mapping using the 2-D representation is also proposed as an interesting application to structure visualization.</p

    LFM-Pro: a tool for detecting significant local structural sites in proteins

    Get PDF
    Motivation: The rapidly growing protein structure repositories have opened up new opportunities for discovery and analysis of functional and evolutionary relationships among proteins. Detecting conserved structural sites that are unique to a protein family is of great value in identification of functionally important atoms and residues. Currently available methods are computationally expensive and fail to detect biologically significant local features

    Sequence Alignment Reveals Possible MAPK Docking Motifs on HIV Proteins

    Get PDF
    Over the course of HIV infection, virus replication is facilitated by the phosphorylation of HIV proteins by human ERK1 and ERK2 mitogen-activated protein kinases (MAPKs). MAPKs are known to phosphorylate their substrates by first binding with them at a docking site. Docking site interactions could be viable drug targets because the sequences guiding them are more specific than phosphorylation consensus sites. In this study we use multiple bioinformatics tools to discover candidate MAPK docking site motifs on HIV proteins known to be phosphorylated by MAPKs, and we discuss the possibility of targeting docking sites with drugs. Using sequence alignments of HIV proteins of different subtypes, we show that MAPK docking patterns previously described for human proteins appear on the HIV matrix, Tat, and Vif proteins in a strain dependent manner, but are absent from HIV Rev and appear on all HIV Nef strains. We revise the regular expressions of previously annotated MAPK docking patterns in order to provide a subtype independent motif that annotates all HIV proteins. One revision is based on a documented human variant of one of the substrate docking motifs, and the other reduces the number of required basic amino acids in the standard docking motifs from two to one. The proposed patterns are shown to be consistent with in silico docking between ERK1 and the HIV matrix protein. The motif usage on HIV proteins is sufficiently different from human proteins in amino acid sequence similarity to allow for HIV specific targeting using small-molecule drugs

    Gene Signature Reveals Decreased SOX10-Dependent Transcripts in Malignant Cells From Immune Checkpoint Inhibitor-Resistant Cutaneous Melanomas

    Get PDF
    Evidence is mounting for cross-resistance between immune checkpoint and targeted kinase inhibitor therapies in cutaneous melanoma patients. Since the loss of the transcription factor, SOX10, causes tolerance to MAPK pathway inhibitors, we used bioinformatic techniques to determine if reduced SOX10 expression/activity is associated with immune checkpoint inhibitor resistance. We integrated SOX10 ChIP-seq, knockout RNA-seq, and knockdown ATAC-seq data from melanoma cell models to develop a robust SOX10 gene signature. We used computational methods to validate this signature as a measure of SOX10-dependent activity in independent single-cell and bulk RNA-seq SOX10 knockdown, cell line panel, and MAPK inhibitor drug-resistant datasets. Evaluation of patient single-cell RNA-seq data revealed lower levels of SOX10-dependent transcripts in immune checkpoint inhibitor-resistant tumors. Our results suggest that SOX10-deficient melanoma cells are associated with cross-resistance between targeted and immune checkpoint inhibitors and highlight the need to identify therapeutic strategies that target this subpopulation

    MicroarrayDesigner: an online search tool and repository for near-optimal microarray experimental designs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Dual-channel microarray experiments are commonly employed for inference of differential gene expressions across varying organisms and experimental conditions. The design of dual-channel microarray experiments that can help minimize the errors in the resulting inferences has recently received increasing attention. However, a general and scalable search tool and a corresponding database of optimal designs were still missing.</p> <p>Description</p> <p>An efficient and scalable search method for finding near-optimal dual-channel microarray designs, based on a greedy hill-climbing optimization strategy, has been developed. It is empirically shown that this method can successfully and efficiently find near-optimal designs. Additionally, an improved interwoven loop design construction algorithm has been developed to provide an easily computable general class of near-optimal designs. Finally, in order to make the best results readily available to biologists, a continuously evolving catalog of near-optimal designs is provided.</p> <p>Conclusion</p> <p>A new search algorithm and database for near-optimal microarray designs have been developed. The search tool and the database are accessible via the World Wide Web at <url>http://db.cse.ohio-state.edu/MicroarrayDesigner</url>. Source code and binary distributions are available for academic use upon request.</p

    Weighted set enrichment of gene expression data

    No full text

    Similarity search in protein sequence databases using metric access methods

    No full text
    The rapid increase in the size of biological sequence data owing to the advancements in high-throughput sequencing techniques, and the increased complexity of hypothesis-driven exploration of this data requiring massive number of similarity queries call for new approaches for managing sequence databases and analysis of this information. The metric space representation for sequences is suitable for similarity search and provides several sophisticated metric-indexing techniques. In this work, we provide a thorough survey and analysis of the application of metric access methods to similarity search in protein sequence databases. A framework supporting application of different metric space indexing methods is developed and a non-redundant sequence database is used to benchmark different methods in terms of number of distance-computations incurred and the computation time required during database compilation and query phases. The parameters of each method are optimized on a subset of experimental conditions. We demonstrate that Onion-Tree, a hybrid metric access method, performs the best in both index building and querying phases for the protein database investigated, and scales well for large databases, incurring distance computations with 0.5% of the database sequences per query

    Approximate similarity search in genomic sequence databases using landmark-guided embedding

    No full text
    Similarity search in sequence databases is ofparamount importance in bioinformatics research. As the size of the genomic databases increases, similarity search of proteins in these databases becomes a bottle-neck in large-scale studies, calling for more efficient methods of content-based retrieval. In this study, we present a metric-preserving, landmark-guided embedding approach to represent sequences in the vector domain in order to allow efficient indexing and similarity search. We analyze various properties of the embedding and show that the approximation achieved by the embedded representation is sufficient to achieve biologically relevant results. The approximate representation is shown to provide several orders of magnitude speed-up in similarity search compared to the exact representation, while maintaining comparable search accuracy
    corecore